+ - 0:00:00
Notes for current slide
Notes for next slide

UCSAS 2022 - Hockey Analytics w/ Python

Venkata Patchigolla

October 2022

1 / 36

Today's Workshop

  • Introduction
  • Data Fetching
    • requests
  • Data Manipulating
    • numpy, pandas
  • Data Visualizations
    • matplotlib
2 / 36

Introduction

We will explore tools in Python that will allow us to explore hockey data


Goal: Visualize where on the field skaters took shots and scored


3 / 36

Introduction

We will explore tools in Python that will allow us to explore hockey data


Goal: Visualize where on the field skaters took shots and scored


... but to do anything, we need a data source

3 / 36

Data Sources

Big Data Cup

Kaggle


Specialty sources

Their Hockey Counts

CHL microstats

NHL microstats


But today we will be gathering data straight from the source - NHL API

4 / 36

NHL API

5 / 36

NHL API

http://statsapi.web.nhl.com/api/v1/game/2021020002/feed/live

Three Parts the the Game ID*

  • Year: 2021
  • Season: 02
  • Game Number: 0002
6 / 36

NHL API

import requests
year = '2021'
season_type = '02'
max_game_ID = 2000
7 / 36

NHL API

def get_game(i):
game_ID = year + season_type +str(i).zfill(4)
try:
r = requests.get(url='http://statsapi.web.nhl.com/api/v1/game/'+ game_ID +'/feed/live',
timeout=5)
data = r.json()
return data
except:
pass
8 / 36

Speeding it up with multiprocessing

import multiprocessing
with multiprocessing.Pool(20) as pool:
game_Data = pool.map(get_game, range(1,max_game_ID))
9 / 36

Extracting Data from the request

10 / 36

Extracting Data from the request

# event, x, y, team_o (offence team), team_d (defending team), period, player_o, year, month, time_in_period, date_time
master_data = []
for data in game_data:
teams = {
"away": data['gameData']['teams']['away']['name'],
"home": data['gameData']['teams']['home']['name']
}
teams_list = list(teams.values())
plays = data['liveData']['plays']['allPlays']
# Year and month of game
date = data['gameData']['datetime']['dateTime'].split('-')
year = date[0]
month = date[1]
11 / 36

Extracting Data from the request

We care of two event types- Shot and Goal

for play in plays:
for event in event_types:
if play['result']['event'] in [event]:
if 'x' in play['coordinates'] and 'y' in play['coordinates']:
x = play['coordinates']['x']
y = play['coordinates']['y']
TEAM = play['team']['name']
if TEAM:
team_o = TEAM
team_d = teams_list[(teams_list.index(TEAM) + 1) % 2]
# period
period = play['about']['period']
master_data.append([event, x, y, team_o, team_d, period, player_o, year, month])
12 / 36

13 / 36

Data Manipulation

Once you have all the data, you can convert it into a pandas DataFrame for easier analysis

df = pd.DataFrame(master_data, columns=['event', 'x', 'y', 'team_o', 'team_d', 'period', 'player_o', 'year', 'month', 'time', 'datetime'])
14 / 36

Resulting DataFrame

15 / 36

Now the Fun part!

Now that we have all the data in the form we want, we can now start playing with the data!

16 / 36

Now the Fun part!

Now that we have all the data in the form we want, we can now start playing with the data!

Lets say I want to rank skaters with the highest scoring average ... What should I do?

16 / 36

Scoring Average Rank for Skaters

17 / 36

Scoring Average Rank for Skaters

18 / 36

Scoring Average Rank for Skaters

Now say I want to do the same ... but with teams this time

18 / 36

Scoring Average Rank for Teams

19 / 36

Scoring Average Rank for Teams

20 / 36

Scoring Average Rank for Teams

But this does not give us as much information as we want ... we can do better!

20 / 36

Display of Goals and Shots for Teams

21 / 36

Display of Goals and Shots for Teams

22 / 36

Display of Goals and Shots for Teams (Improved)

23 / 36

24 / 36

Visualizing Goals

Lets say I want to see where all the Goals scored on Tampa Bay Lightning were shot from, I can do a quick pandas filter and get all the data I need

coors = df[ (df['team_d'] == 'Tampa Bay Lightning') & (df['event'] == 'Goal') ][['x', 'y']]
25 / 36

The Coordinates where the opposing team scored a goal on Tampa Bay Lightning

26 / 36

A note about the Coordinates!

X: -100 to 100

Y: -42.5 to 42.5

27 / 36

Simple Visualization - Code

28 / 36

Simple Visualization - Graph

29 / 36

Improved Visualization - Code

Place a picture of the field underneath the scatter plot

30 / 36

Improved Visualization - Code

Place a picture of the field underneath the scatter plot

from PIL import Image, ImageOps
image_file = "Hockey-field-half.png"
image = Image.open(image_file).resize((400,340))
image_arr = np.array(image.getdata())
image_arr = image_arr.reshape(image.size[1], image.size[0], 4)
30 / 36

Improved Visualization - Code

31 / 36

Improved Visualization - Graph

32 / 36

Improved Visualization - Graph

33 / 36

Visualization - Skaters

Visualizing where Alex Overchkin takes his shots.

34 / 36

Visualization - Skaters

Visualizing where Alex Overchkin scores.

35 / 36

Thank you!

36 / 36

Today's Workshop

  • Introduction
  • Data Fetching
    • requests
  • Data Manipulating
    • numpy, pandas
  • Data Visualizations
    • matplotlib
2 / 36
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow